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Key-words: linear solvers, Krylov subspace methods, preconditioning, filtering property, block incomplete 
decomposition 



* INRIA Saclay - He de France, Laboratoire de Recherche en Informatique Universite Paris-Sud 11, France 
(laura. grigoriOinria.f r). 

t Laboratoire J.L. Lions, CNRS UMR7598, Universite Paris 6, France (natafOaim. jussieu.fr). 



Centre de recherche INRIA Saclay - Ile-de-France 
Pare Orsay Universite 
4, i-ue Jacques Monod, 91893 ORSAY Cedex 

Telephone : +33 1 72 92 59 00 



Decomposition a base de filtrage generalisee 



Resume : Ce document presente une nouvelle technique de preconditionnement adapte pour les matrices 
issues de la discretisation d'un systeme d'equations aux derivees partielles sur des maillages non structures. 
Le preconditionneur satisfait une propriete dite de filtrage, qui signifie que la matrice d'entree est identique 
au preconditionneur pour un vecteur donne de filtrage. Le choix de ce vecteur permet d'attenuer I'efFet des 
modes de basse frequence sur la convergence et ainsi de diminuer ou d'eliminer le plateau qui est souvent 
observe dans la convergence des methodes iteratives. En particulier, le document presente une approche 
generale qui permet d'assurer que la propriete de filtrage est satisfaite lors d'une decomposition matricielle. 
La matrice d'entree pent avoir une structure creuse arbitraire. Ainsi, elle pent etre renumerotee en utilisant 
la methode de dissection emboitee, afin de permettre un calcul parallele du preconditionneur et du processus 
iteratif. 

Mots-cles : solvcurs lincaircs, methode de sous-espaces de Krylov, preconditionnement, propriete de 
filtrage, decomposition incomplete par blocs 
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1 Introduction 

Iterative methods are widely used in industrial applications, and preconditioning these methods is an impor- 
tant topic which has already been extensively studied ^|9jj3j. In this context, algebraic multigrid methods 
are very successful for certain classes of applications, in particular scalar PDEs [SI El HH IH US] ■ They are 
known to have good weak scalability properties, but they are not strongly scalable. This motivates research 
on iterative solvers for systems of PDEs and/or large number of processors. 

Several highly used preconditioners, as the incomplete LU factorizations and domain decomposition 
methods, are known to have scalability problems, in terms of both problem size and number of processors. 
This is often due to the presence of several low frequency modes that hinder the convergence of the iterative 
method. To solve this problem, a different class of so called filtering preconditioners has been proposed 
[1] m [ini [TT] , where the choice of the filtering vector is made to alleviate the effect of low frequency modes 
on the convergence. For domain decomposition methods, coarse grid correction is known to be mandatory 
for solving the scalability problem [T3] . 

In this paper we focus on the generalization and suitability for parallel computing of the filtering precondi- 
tioner. This preconditoner is an incomplete factorization where it is possible to ensure that the factorization 
will coincide with the original matrix for some specified vector, called a filtering vector. Satisfying this filter- 
ing condition is an important factor for accelerating the convergence of the iterative method. The previous 
research on these methods considered only matrices arising from the discretization of PDEs on structured 
grids, where the matrix has a block tridiagonal structure [U [2j [TBI [T^ • To the best of our knowledge, there 
was no previous result on the parallelization of filtering preconditioners. One of the important results of this 
research is the development of a new and general approach to ensure that a filtering condition is satisfied in 
a matrix decomposition. This approach is based on an innovative way of organizing the computations that 
allows on one side to satisfy a filtering property and on another side to perform a parallel computation. This 
approach has been used to develop a preconditioner based on a block approached decomposition, that we 
refer to as block filtering preconditioner. While we discuss in detail the right filtering property At = Alt, a 
similar approach can be used to develop a preconditioner that satisfies the left filtering property t'^A = t'^M, 
where A is the input matrix, M is the preconditioner and t is the filtering vector. 

This preconditioner does not impose any particular structure on the input matrix. To allow its usage 
on parallel architectures, the input matrix can be reordered using nested dissection. This reordering allows 
a parallel implementation of the construction of the preconditioner, as well as of the iterative process. 

The preconditioner can be seen as a generalization for unstructured grids of the preconditioner presented 
in [2] for block tridiagonal matrices. In contrast to the preconditioner presented in [2] that has been shown 
to be efficient in combination with ILUO, the block preconditioner presented here is efficient as a stand-alone 
preconditioner. 

The goal of this paper is only to present the algebraic framework which allows a filtering condition to be 
satisfied in an incomplete block factorization. The numerical results showing the efficiency of the proposed 
preconditioner and its parallel performance will be presented in a future paper. 



2 Block Filtering Decomposition 



In this section we describe a block filtering preconditioner M which satisfies the right filtering condition 
{AI — A)t — 0, where t is a filtering vector. 

Consider a matrix A of size n x n partitioned into a block matrix of size N x N with square diagonal 
blocks (not necessarily of a same size) 



/An 



A 



\Ani 

An exact block LDU factorization of A is written as 



A = 



L21 D22 



J 



D22 
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where Da ,i — 1 . . . N are square invertible matrices of size bi x bi with bi < n . Let D = Block-Diag(Z3i i ,...,£) at jv ) , 
and let 

\ / f/ii . . 



L = 



( 



Nl 



L 



U = 



N.N-l 



/ 



(7i 



N 



\ 



\ 







The factorization ([T]) can be written a.s A = {L + D)D^^{U + D). In the following we refer to the blocks 
of L, D, U as C, where — Cij if i > j, Uij = Cij if i < j, and Dij = Cij if i ~ j. In other words, the 
matrix C can be written a.s C = L + D + U. The blocks of L, D, and U are computed using the following 
formula, with i, j = 1 . . . N: 



Aij i = 1 or j = 1 



A, 



Emin(ij') — 1 j- 7-, — Itt ■ ^ -. • ^ -1 

k=uL,,=io,u,,=io LtkD,^,^ Ukj, i> 1 or j>l 



(2) 



In practice, even if the matrix A is very sparse, the factors L, D, U can be much denser. In particular 
the term Li^D'^^Ukj can introduce an important amount of fill-in in the factors. In our work, the goal is 
to approximate the inverse of the diagonal blocks D^^, fc = 1 . . . n by a sparse matrix such that LikD^^Ukj 
stays sparse. In the context of filtering decomposition, there are mainly two approximations used. Consider 
a diagonal block Dkk- The first approach consists of approximating D'^^ by a sparse matrix F = (3, chosen 
such that a filtering condition is satisfied. The second approach aims at identifying a better approximation of 
-^kk starting from (3. As described in section[3l this leads to an approximation of the form F = 2j3 — j3DkkP 
[2], where /3 is a diagonal matrix. We will discuss both approaches, but we note that the first approach is 
more stable and leads to better results in practice. 

In the following we explain the construction of the block filtering preconditioner M . We first give its 
definition, and then explain more in detail the reasoning that lead to its construction. In section [3] we 
discuss the construction of the approximation F of the inverse of the block diagonal matrices. 

Definition 2.1 Let A be a matrix of size n x m. For k — 1 . . . N , let Lk be a matrix of size n x rik, Dk be 
an invertible matrix of size rik x Uk, and Uk be a matrix of size rik x m. Let M be a matrix defined by 



M - A = 



N 

E 

k=l 



LkDZ^Uk 



N 

E 

k=l 



LkFkUk- 



A construction that enabes filtering is a construction where Fk, k = 1 . 
relation 

FkUkt = D-^Ukt for all k^l: N 



. N are matrices that satisfy the 

(3) 



Definition 2.2 Let t be a filtering vector of size n and let A be a matrix of size n x n partitioned into a 
block matrix of size N x N . A block filtering decomposition is defined as 



M = 



L21 



D2 



JJV,JV-1 



V 



D- 
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I Dn U12 
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JV-1,JV-1 



UlN 
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(4) 

where Du^i = 1. . .N are square invertible matrices of size bi x bi with bi < n. In more compact form, 
M = LDU , where AI, L, D, U are block matrices of size N xN . Let C = L + D + U and lett ~ {ti;t2'T . ■ - tN). 
The blocks are computed as 



Ci' 



Aij i = 1 or j = 1 
An 



Emin(ij') — 1 T Ft • ^ i • ^ i 



(5) 



where Fkj is a sparse approximation of D^.^ that satisfies 



FkjUkjtj 



^kk^kjtj for all k = 1 : min(i, j) - 1 with L^k 7^ 0, Ukj ^ 



(6) 



If Ukjtj is a vector of nonzero elements, a matrix Fkj that satisfies the condition in equation ^ can be 
computed as 

Fkj = Diag{{D^^Ukjtj).lUkjtj) 
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where ./ is the pointwise division (see for exaraple equation (15) in [2]). However, for very sparse matrices, 
Ukj can have rows of ah zeros, and hence the resuh of Ukjtj can be a vector with zero elements. We present 
in section [3] a construction of Fkj that solves this problem. 

The main idea in the design of the preconditioner in Definition [52] is to ensure that each block satisfies an 
appropriate filtering condition Mijtj = Aijtj, such that the global filtering condition Mt = At is satisfied, 
where t = {ti\t2 \ ■ ■ - tiq) is the filtering vector. We note B — M — A, and so we want to ensure that for 
each block Bijtj = 0. This is different from the approach used for block tridiagonal systems [5], where 
B = M — Ais, a. block diagonal matrix. The matrix B = M — A\s formed by (i3ij)i<i j<Ar, with 

min(ij)-l 

Bij = Cij + ^ ^ikD^k^k] — Aij. (7) 

The construction of M ensures that for each block Bij, for each term LikD^^Ukj of the summation in 
Equation[71 F^j is chosen such that the filtering is satisfied. That is, LikD^kUkjij = LikFkjUkjtj- From 
this the formula of Fkj in Equation |6] is deduced. This ensures that the global filtering for the whole matrix 
is satisfied. Note that there is a Fkj for each nonzero block Ukj , that is the approximation of the diagonal 
block depends on the off-diagonal blocks of U. We give a formal proof in the following lemma. 

Lemma 2.1 Consider an nxn matrix A and a filtering vector t of size n. If the block filtering preconditioner 
M as defined in Definition \2.S\ exists, then it satisfies the filtering property, that is Mt — At. 

Proof. The preconditioner M satisfies the right filtering property if for each nonzero block Bij we have 
Bijtj = 0, where Bij is of size 6,; x bj and tj is a vector of bj elements. In the formula of Bij from equation 
(O, we replace the expression of dj from equation ([5]). We obtain: 

min(i,j) — 1 min(i,j) — 1 \ 

^io^j = I X! ^ikDl^tJkj ~ ^ LikFkjUkj j tj = 

min(ij)-l \ 

^ LikD~i^{I - DkkFkj)Ukj j = 
■ 

We give now a definition of the block filtering preconditioner, in which the inverse of a diagonal block 
matrix Dkk is approximated by 2Fkj — FkjDkkFkj- We show that if the matrix Fkj satisfies the same 
condition as in equation ([5]), the preconditioner satisfies the filtering property. 

Definition 2.3 A block filtering preconditioner M of a matrix A of size n x n is defined by an incomplete 
block factorization 



M = 



L21 D22 



\ Lni ■ ■ ■ Ln,n-i Dnn / 



( ^" \ 



^22 



V D-\ J 



/ Dii Ul2 ... f/lAT \ 

Dn-1,N-1 Un-1,N 

\ Dnn / 



(8) 

and a filtering vector t of size n, where Da^i — \ ...N are square invertible matrices of size bi x bi with 
bi < n. In more compact form, M — LDU, where M,L,D,U are block matrices of size N x N. Let 
C = L + D + U and let t — {ti]t2] ■ ■ ■ ^at). The blocks are computed as 

{Aij i = 1 or j = 1 
A., - ESK1^U..#o ^^^(2^'^. - Fk,DkkFk,)Uk„ ^>l or j>l 

where Fkj is a sparse approximation of Dkk that satisfies 

FkjUkjtj = D'^k^kjtj (10) 

Lemma 2.2 Consider an nxn matrix A and a filtering vector t of size n. If the block filtering preconditioner 
M as defined in Definition \2.S\ exists, then it satisfies the filtering property, that is Mt = At. 
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Proof. We use the same approach as in Lemma 12.11 The prcconditioner M satisfies the right filtering 
property if for each nonzero block Bij we have Bijtj =0. In the formula of Bij from equation ([7]), we 
replace the expression of Cij from equation ([9]). We obtain: 

(min(z,j) — 1 min{i.j) — l \ 

X! ^ikD^ktJkj - ^ Lik{'2Fkj - Fk]DkkFkj)Ukj \ tj = 

min(ij')-l \ 

L,k{Fk,Dkk ~ I)Dll {DkkFkj - I)Ukj <j = 

.k = l,L^k¥^0,Uk^¥^a I 



3 Construction of the approximation 

We describe the construction of the approximation matrices F^j. We denote the element in position (i, j) 
of a matrix A as A{i,j) and the element in position i of a vector v as v{i). 

The block filtering preconditioner defined in Definitions 12.21 and 12 . 3 1 rea uires the construction of matrices 
Fuj that satisfy the equation that is F^jUkjij = D^k Ukjtj We note in the following Mkjtj = Vkj and 
D^^Ukjtj = Ukj , where Vkj,Ukj are vectors of bk elements. Hence we have FkjVkj — Ukj- In the following, 
for the ease of understanding, we simplify the notation and discuss the relation Fv — u. The approach used 
previously for the construction of F is to compute it as 

F = Diagiu./ v) 

where ./ is pointwise division. 

For sparse matrices, the vector can have zero elements. Possibly u can have only zero elements, but 
this case is simple to solve. We discuss the case of v having zero elements. If v has only zero elements, then 
u is also zero, and hence the relation Fv = m is satisfied. We discuss hence the case when there is at least 
a nonzero element in v. Let j be the index of a nonzero element, that is v{j) ^ 0. If v{i) = 0, then we take 
F{i,j) = u(j)/v{j). In other words, a simple construction of the matrix F is as follows: 



F{i,j) 



u{i)/v{i) if i — j and v{i) ^ 

u(i)lv{2) if v{i) = and j = mmk:v(k)^o \k - i\ 

otherwise 



(11) 



An example of construction of F is as follows: 

/O u{l)/v{2) 
u{2)/v{2) 
u{i)/v{2) 



V 



M(4)/'y(5) 
m(5)/z;(5) 
u\&)/v{h) 0/ 



/ \ 

v{2) 




i;(5) 

V / 



(u{l)\ 
u{2) 
uCi) 
u{A) 

"(5) 
\u{Q)J 



The matrix F can be easily constructed to be symmetric by letting F{j,i) — F{i,j). But F might not 
be SPD. 

We can also use deflation techniques [71 (TUl E] to construct Fkj that satisfies the equation (O , that 
is FkjUkjtj = D'jl^ Ukjtj. Equation ([T^ defines Fkj for a symmetric matrix. 



Fkj 


= P + Q 


P 


= I-QA 


Q 


= ZE-^^Z^ 


E 


- {Z^DkkZ) 


Z 


= Ukj tj 



(12) 
(13) 
(14) 
(15) 
(16) 
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3.1 Suitability for Parallelism 



The block filtering preconditioner was defined in a general way. With a suitable ordering, a parallel pre- 
conditioner can be obtained. In our work, we focus on matrices partitioned using nested dissection. This 
partitioning leads to algorithms that can be implemented in parallel. We describe here briefly this ordering. 
Nested dissection considers the undirected graph G of a symmetric matrix A. It identifies a separator S 
that partitions the graph into two disconnected graphs Gi, G2. The input matrix is permuted such that 
the vertices corresponding to the separator S are ordered after the vertices corresponding to the two dis- 
connected graphs Gi, G2. Then the same partitioning can be applied on the two disconnected graphs, with 
the recursion beind stopped when the number of desired independent parts has been reached. 

Consider a matrix A of size n x n partitioned using nested dissection into a block matrix of size TV x TV. 
The following example displays the result obtained after applying two steps of nested dissection that leads 
to a block matrix of size 7x7. 



PAP' 



The preconditioner M is defined as 



/All 
A31 

\An 



A22 
A32 



A. 



M 



L31 



\L7 



D22 
L32 
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Dii 
L74 



Ds5 
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Dm, 



DttJ 



Ai3 








^17\ 
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A27 
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A3T 














Aid 


A47 












A55 
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As 


Aaa 


A(i7 








A73 


A-J4 


A75 


A-ja 


A77) 












(Du 




Ul3 






Ul7\ 








D22 


U23 






U27 










D33 






U37 




1=1:7 








D44 


U46 


U74 












D55 




U57 














Dae 


Ue,7 
















D77) 



(17) 



where each block of the factors L, Z?, U can be computed following Definitions l2.2l or l2.3l With this partition, 
both the preconditioner and the iterative process can be implemented in parallel. 



4 Conclusions 

In this report we have briefiy presented a block filtering preconditioner AI that is build from an input 
matrix A and a filtering vector t and satisfies the property Alt — At. With an appropriate ordering as 
nested dissection, this preconditioner is suitable for parallel implementations. A future paper will focus on 
numerical results on scalar of PDEs discretized on two-dimensional and three-dimensional structured and 
unstructured grids showing that this method is efficient in practice. 



References 

[1] Y. Achdou and F. Nataf. An iterated tangential filtering decomposition. Numer. Linear Algebra AppL, 
10(5-6):511~-539, 2003. Preconditioning, 2001 (Tahoe City, CA). 

[2] Y. Achdou and F. Nataf. Low frequency tangential filtering decomposition. Numer. Linear Algebra 
AppL, 14(2):129-147, 2007. 

[3] Michele Benzi and Miroslav Tuma. A comparative study of sparse approximate inverse preconditioners. 
Appl. Num. Math., 30:305-340, 1999. 

[4] D. Braess. Towards algebraic multigrid for elliptic problems of second order. Computing, 55(4):379-393, 
1995. 

[5] G. Meurant. Computer solution of large linear systems. North-Holland Publishing Co., Amsterdam, 
1999. 

[6] R. Nabben and C. Vuik. A comparison of abstract versions of deflation, balancing and additive coarse 
grid correction preconditioners. Numer. Linear Algebra Appl., 15(4):355-372, 2008. 



RR n° 7569 



8 



Laura Grigori , Frederic Nataf 



[7] R. A. Nicolaides. Deflation of conjugate gradients with applications to boundary value problems. SIAM 
J. Numer. Anal, 24(2):355-365, 1987. 

[8] J. W. Ruge and K. Stiiben. Algebraic multigrid. In Multigrid methods, volume 3 of Frontiers Appl. 
Math., pages 73-130. SIAM, Philadelphia, PA, 1987. 

[9] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Company, 1996. 

[10] Y. Saad, M. Yeung, J. Erhel, and F. Guyomarc'h. A deflated version of the conjugate gradient algorithm. 
SIAM J. Sci. Com,put., 21(5):1909-1926 (electronic), 2000. Iterative methods for solving systems of 

algebraic equations (Copper Mountain, CO, 1998). 

[11] K. Stiiben. A review of algebraic multigrid. J. Comput. Appl. Math., 128(l-2):281-309, 2001. Numerical 
analysis 2000, Vol. VII, Partial differential equations. 

[12] J. M. Tang, R. Nabben, C. Vuik, and Y. A. Erlangga. Comparison of two-level preconditioners derived 
from deflation, domain decomposition and multigrid methods. J. Sci. Comput., 39(3):340-370, 2009. 

[13] A. Toselli and O. Widlund. Domain decomposition methods — algorithms and theory, volume 34 of 
Springer Series in Computational Mathematics. Springer- Verlag, Berlin, 2005. 

[14] U. Trottenberg, C. W. Oosterlee, and A. Schiiller. Multigrid. Academic Press Inc., San Diego, CA, 
2001. With contributions by A. Brandt, P. Oswald and K. Stiiben. 

[15] P. Vanck, M. Brczina, and J. Mandc4. Convergence of algebraic multigrid based on smoothed aggrega- 
tion. Numer. Math., 88(3):559 579, 2001. 

[16] C. Wagner. Tangential frequency filtering decompositions for unsymmetric matrices. Numer. Math., 
78(1):143-163, 1997. 

[17] C. Wagner and G. Wittum. Adaptive filtering. Numer. Math., 78(2):305-328, 1997. 



INRIA 




Centre de recherche INRIA Saclay - Ile-de-France 
Pare Orsay Universite - ZAC des Vignes 
4, rue Jacques Monod - 91893 Orsay Cedex (France) 

Centre de recherche INRIA Bordeaux - Sud Quest : Domaine Universitaire - 351, cours de la Liberation - 33405 Talence Cedex 
Centre de recherche INRIA Grenoble - Rhone-Alpes : 655, avenue de I'Europe - 38334 Montbonnot Saint-Ismier 
Centre de recherche INRIA Lille - Nord Europe : Pare Scientifique de la Haute Borne - 40, avenue Halley - 59650 Villeneuve d'Ascq 
Centre de recherche INRIA Nancy - Grand Est : LORIA, Technopole de Nancy-Brabois - Campus scientifique 
615, rue du Jardin Botanique - BP 101 - 54602 Villers-les-Nancy Cedex 
Centre de recherche INRIA Paris - Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex 
Centre de recherche INRIA Rennes - Bretagne Atlantique : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex 
Centre de recherche INRIA Sophia Antipolis - Mediterranee : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex 



Editeur 

INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France) 

http://www.inria.fr 
ISSN 0249-6399 



